[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178
Open
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
Open
[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
HuggingFace tokenizers for models like Llama 3.1 may return a list for eos_token_id (e.g. [128001, 128009]). SamplingParams._setup previously assigned this list directly to self.end_id, which is typed Optional[int]. When the C++ binding (tllm.Request) received a list instead of an int it raised std::bad_cast, crashing trtllm-serve for every inference request. Fix: when eos_token_id is a list, use the first element as end_id (scalar) and append the remaining tokens to stop_token_ids so that generation still halts on all configured EOS tokens. Fixes NVIDIA#11625 Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>
Contributor
📝 WalkthroughWalkthroughModified Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Contributor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittest/llmapi/test_sampling_params.py`:
- Line 1: Update the copyright header string "Copyright 2025 NVIDIA CORPORATION
& AFFILIATES" in this new test file to the latest meaningful modification year
(change 2025 to 2026) so the file header matches the repository policy; locate
the header line at the top of tests/unittest/llmapi/test_sampling_params.py and
replace the year accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 33a2bb26-0d45-4d5b-b5be-c307ddef63ee
📒 Files selected for processing (2)
tensorrt_llm/sampling_params.pytests/unittest/llmapi/test_sampling_params.py
Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
SamplingParams._setup()crashedtrtllm-servewithstd::bad_castwheneverthe tokenizer's
eos_token_idis a list instead of a scalar int.Before:
After:
Why
HuggingFace tokenizers for models like Llama 3.1 expose
eos_token_id = [128001, 128009](aList[int])._setup()blindly assigned this list toself.end_id, which is typedOptional[int]. Downstream,base_worker.pypassesend_iddirectly intotllm.Request(end_id=...)— a C++ nanobind binding that expectsint32_t.PyBind/nanobind cannot cast a Python
listtoint32_t, so it raisesstd::bad_cast, killing the server for every inference request onany model whose tokenizer advertises multiple EOS tokens.
Why is this PR open if issue #11625 is closed?
Issue #11625 was closed by a maintainer who was unable to reproduce the
crash on their own hardware (
"I'm closing this issue. Please feel free to open a new one if you need~"). It was not closed because a fix was merged.The original reporter confirmed the crash persisted after the closure.
upstream/mainstill contains the unfixed assignment as of this PR.The root cause is version-dependent: newer versions of
transformers(≥ 4.43) began returning
List[int]fromtokenizer.eos_token_idformodels whose
tokenizer_config.jsonhas a list-valuedeos_token_id.Earlier versions always returned a scalar, which is why the bug was not
universally reproducible.
How
tokenizer.eos_token_idinto a local variable first.list, useeos_token_id[0]as the scalarend_idandappend
eos_token_id[1:]tostop_token_idsso that generationterminates on all configured EOS tokens, not just the first one.
generation_config.eos_token_idlist-normalisation logic(lines 436-447) is untouched; de-duplication between the two paths is
handled by the existing
stop_token not in self.stop_token_idsguard.Change is minimal and isolated to the 14 new lines inside
_setup().No other call sites, no architecture changes.
Test
Before fix (reproduction):
After fix:
params.end_id == 128001(int),params.stop_token_ids == [128009].Regression tests added in
tests/unittest/llmapi/test_sampling_params.py(pure Python, no GPU or model weights required):
test_setup_scalar_eos_token_idtest_setup_list_eos_token_id_single_elementtest_setup_list_eos_token_id_multiple_elementsend_idisinttest_setup_list_eos_token_id_no_duplicate_stop_tokenstest_setup_end_id_already_set_ignores_tokenizer_eosend_idnot overwrittentest_setup_list_eos_token_id_pad_fallbackpad_idfallback still an intAll pre-commit hooks passed (CRLF, ruff, ruff-format, codespell, DCO).
Summary by CodeRabbit
Bug Fixes
Tests